73305.P001 



UNITED STATES PATENT APPLICATION FOR 
INTEGRATED CIRCUIT I/O USING A HIGH 

PERFORMAMpp PI .c irrrrnrArr 



p Inventor 
in Michael Farmwald 

ii Mark Horowitz 




Prepared by: 



^MM»d ^ A, 5^ farv«r« 

6w »wi Row th^ RBanigngr r " 

April 



BROWN & BAIN 
600 Hansen Way 
Suite 100 
Palo Alto. California 94306 
(415) 856-9411 



Our Refi SanBus D-i 
Jnteqrated Clreii<» t ^q jsBlna m 

An integrated circuit bus interface for computer and 
Video systems is described which allovs high speed transfer of 
blocks of data, particularly to and from memory devices, with 
reduced power consumption and increased system reliability. A 
new method of physically implementing the bus architecture is 
also described. 



BACKGROTTW D OP TOW TNVEWTTn w 

Semiconductor computer memories have traditionally been 
designed and structured to use one memory device for each bit, or 
.loall group of bits, of any individual computer word, where the 
vord Sire is governed by the choice of computer. Typical word 
«i2es range from 4 to 64 bits. Each memory device typically is 
connected in parallel to a series of address lines and connected 
to one Of a series of data lines. When the computer seeks to 
read from or write to a specific memory location, an address is 
put on the address lines and some or all of the memory devices 
are activated using a separate device select line for each needed 
device. One or more devices may be connected to each data line 
but typically only a small number of data lines are connected to 
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. .i„,U ^ ^ 

--1C,., , ^^^^^^ ^ ^^^^^^^ „ » 

ft. i. thu. .=«..«, or provide, m p.„uel for «ch «^ry 
read or vrlte operation. For the .ystem to operate properly, 
every .i„gie ^ ^^^^^^ ^^^^^^ 

dependably and correctly. 

TO understand the concept of the present Invention. It 
1. helpm to revl„ the architecture of conventional »e«,ry 
devices, internal to nearly all types of „a»ory devices 
(includin, the .ost widely used Dyna^c RandoM Access He«ory 
(0«M,, Static RAM ,s«„, and Read Only Kenory (ROM, devices,, a 
lar,e n«^r of bits are accessed In parallel each tl« the 
system carries out a .emory access cycle. However, only a s^u 
percentage of accessed bits which are available internally each 
tl« the .e»ory device is cycl«, ever it .cross the device 

boundary to the external world. 

Referring to Pig. i, „odem dram, sram and ROM 
designs have internal architectures with row (word, line. 5 and 
column (bit, lines 6 to allow the »„oxy cells to tile a two 
<U»en.ional area 1. one bit of data is stored at the 
intersection of each word and bit line. w,en a particular word 
line i. «,.bled, all of the corresponding data bit. are 
transferred onto the bit lines. Some prior art DRAMS tajc. 
advantage of this organisation to reduce the number of pin. 
needed to transmit the address. The address of a given memory 
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cell 1. .put into two addresses, row and column, .ach of which 
can be smltlplexed over a bus only half as wide a. the «eaory 
cell address of the prior art would have required. 

COMPARISON WITH PRIOR ART 

l»rior art memory systems have attempted to solve the 
problem of high speed access to memory with limited success. 
U.S. Patent No. 3,821,715 (Hoff et. al.), was issued to Intel 

corporation for the earliest 4.bl9^&!^^W. That patent 
describes a bus connecting a single central processing unit (CPU) 
With multiple RAMS and ROMs. That bus multiplexes addresses and 
data over a 4-bit wide bus and uses point-to-point control 
signals to select particular RAMs or ROMs. The access time Is 
fixed and only a single processing element is permitted. There 
is no block-mode type of operation, and most important, not all 
of the interface signals between the devices are bused (the ROM 
and RAM control lines and the RAM select lines are point-to- 
point). 

In U.S. Patent No. 4,315,308 (Jackson), a bus 
connecting a single CPU to a bus interface unit Is described. 
The invention uses multiplexed address, data, and control 
information over a single 16-bit wide bus. Block-mode operations 
are defined, with the length of the block sent as part of the 
control sequence. In addition, variable access-time operations 
using a "stretch- cycle signal are provided. There are no 

High Performance Bus Interface -3- 



• 



-altlple preeeMin, .i«^t. «d no eap^iuty for «.XUpl. 

are bused. 

in U.S. Patent Ho. 4,449,20? (Kung, .t. nl.). . DRM i. 
«>..eribed „hich -ultlplexe. .ddre„ and data on an Internal bu.. 
The external Interface to thl. DRAM 1. conventional, with 
.eparate control, address and data connection.. 

to U.S. Patent Bos. 4,764,84S and 4,706,166 (Go,, a 3-D 
t«efc.se arrangement of .tack«l die with connections along a 
.ingle edge is described. Such packages are difficult to ns. 
because of the point-to-point wiring required to interconnect 
conventional .emory device, with proces.ing element.. Both 
patent, de.cribe complex .cheme. for .olving these problems, ko 
attempt i. made to .olve the problem by changing the interface. 

In U.S. Patent Ho. 3,969,706 (Proeb.ting, et. al.,, the 
current .tate-of-the-art dram interface i. de.cribed. The 
.ddress i. two-way multiplexed, and there are .eparate pin. for 
lata «,d control (RAS, CAS, w, C8). The number of pin. grow. 
With the .ire of the DRAM', and many of the connection, muat be 
■".de point-to-point in a memory .y.tem u.iag .uch DRAM.. 

There are many backplane buses described in the prior 
•rt, but not in the combination described or having the features 
of this invention. Hany backplane buse. multiple, addre..e. and 
0.ta on a .ingle bu. (e.g., the hd bu.,. ELXSI «,d other, have 
l^lemented .pllt-trans.ction buses (O.S. Patent Ho. 4,595,923 
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•«<• «.«ei.62S (Robert.,,, klxsi ha. .l.e l^ple^t.^ . 
«l.tlv.ly lo»-voltaae-.wl„, corr.nt-»ode ECL driver 
(-pproxl^tely 1 V «i„g,. «dre..-.p.ce regi.ter, are 
1-Ple«„t«, on «o.t baocplane bu.e.. a. i. so,, for- of bloc, 
node operation. 

Nearly all „odem backplane bu.e. l»ple»ent .o«e type 
of arbitration .che^e, but the arbitration .cheme u.ed In thi. 
invention differ. fro» each of the.e. n.s. Patent Ho.. «,e37,«2 
(Culler,, 4.818,985 (Ikeda,. 4.779,089 (Theu., and 4.745,548 
(Blahut, describe prior art .cheae.. All i„„lve either log H 
extra .ignals, (Theu., Blahut,, where K is the number of 
potential bu. regue.tor., or additional delay to get control of 
bu. (ikeda, culler,. Hone of the bu.e. de.cribed in patent, 
or other literature u.e only bu.ed connection.. All contain .o». 
point-to-point connection, on the backplane. Hone of the other 
"pact. Of thi. invention .uch a. pov«r reduction by fetching 
..Oh data block fro. a .ingle device or conpact and low-co.t 3-D 
packaging even apply to backplane bu.e«. 

The Clocking .cheme u,ed in thi. invention ha. not been 
o.ed before and i, f„t vould be difficult to i»ple»„t In 
backplane bu.e. due to the .ignal degradation cau.ed by connector 
•tub.. v.e. Patent Ho. 4.247,817 (Heller, de.cribe. a clocking 
.che« „.i„g ^^^^^ ra..p-.hap«, clock 

.l9n.l. in contra.t to the normal ri.e-ti»e .ignal. «.ed In the 
present Invention. 
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In U.S. Patent »o^r*r€4ftj2aiS. (Vosa), a video RAM is 
described ijhlch implements a parallel-load, serial-out shift 
register on the output of a DRAM. This generally allows greatly 
improved bandwidth (and has been extended to 2, 4 and greater 
width shift-out paths . ) The rest of the interfaces to the DRAM 
(RAS, CAS, multiplexed address, etc.) remain the same as for 
conventional DRAMS. 

One object of the present invention is to use a new bus 
interface built into semiconductor devices to support high-speed 
access to large blocks of data from a single memory device by an 
external user of the data, such as a microprocessor, in an 
efficient and cost-effective manner. 

Another object of this invention is to provide a— 
clocking scheme to permit high speed clock signals to be sent 
along the bus with minimal clock skew between devices. 

Another object of this invention is to allow mapping 
out defective memory devices or portions of memory devices. 

Another object of this invention is to provide a method 
for distinguishing otherwise identical devices by assigning a 
unique identifier to each device. 

Yet another object of this invention is to provide a 
method for transferring address, data and control information 
over a relatively narrow bus and to provide a method of bus 
arbitration when multiple devices seek to use the bus 
8 imul taneous ly . 
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Another object of this Invention Is to psrovide a method 
of diitrlbuting a high-speed memory cache within the DRAM chips 
of a memory system which is much more effective than previous 
cache methods. 

Another object of this invention is to provide devices, 
especially DRAMs, suitable for use with the bus architecture of 
the invention. 

5 SPMMRRY OF INVEMTION 

The present invention includes a memory subsystem 
comprising at least two semiconductor devices. Including at least 
=j one memory device, connected in parallel to a bus, where the bus 
includes a plurality of bus lines for carrying substantially all 
=1 address, data and control information needed by said memory 
I devices, where the control information includes device-select 
T| information and the bus has substantially fewer bus lines than 
the number of bits in a single address, and the bus carries 
device-select information without the need for separate device- 
select lines connected directly to individual devices. 

Referring to Fig. 2, a standard DRAM 13, 14, ROM (or 
SRAM) 12, microprocessor CPU 11, I/O device, disk controller or 
other special purpose device such as a high speed switch is 
modified to use a wholly bus-based interface rather than the 
prior art combination of point-to-point and bus-based wiring used 
with conventional versions of these devices. The new bus 
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inelud.. Clock .l^x., p„„, i»itlple,ed .ddr.... dat. 
control .1^1.. x« . pr.£err«. l«pl«..„tatlo„. s b». data li„.. 
««. an Wdressvalid bus line carry addre.,, data «.d control 
information for menory addresses up to 40 bit. vide. Persons 
.Ull«. in the art will recognise that 16 bus data line, or other 
n«.*,r. Of bus data line, can be u.ed to i„ple.ent the teaching 
Of thl. invention. The „«, b«. i. used to com,ect elements such 
as memory, peripheral, switch and processing «Ut.. 

in the system of this Invention, DRAM. „d other 
devices receive addre.s and control information over the bu. and 

transmit or receive re<r.ested data over the .ane bu.. Each 
-emery device contains only . .i„,i. ^us interface with no other 
.ignal pins, other devices that may be included in the sytem 
c«. connect to the bus and other non-bus lines, .uch as 
input/output lines. The bus support, large data block tran.fer. 
and split transaction, to allow a u.er to achieve high bu. 
utilisation. This ability to rapidly read or write a large block 

of data to one .ingle device at « «-<m. *. . • 

» o aevice at a tijDe i. an important advantage 

of thi. invention. 

The DRJWs that connect to this bus differ from 
conventional i„ . ^^^^^^^^^ ^ ^^^^^^ 

«*ich may store control information, device Identification, 
d«rice-t,pe and other information appropriate for the chip such 
" the addre.. range for each independent portion of the device, 
bu. interface circuit. „u.t be added and the internal, of 
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prior art DRAM devices need to be modified .o they can provide 
and accept data to and from the bus at the peak data rate of the 
bus. This requires changes to the column access circuitry in the 
DRAM, with only a minimal increase in die size. A circuit is 
provided to generate a low skew internal device clock for devices 
on the bus, and other circuits provide for demultiplexing input 
and multiplexing output signals. 

High bus bandwidth is achieved by running the bus at a 
very high clock rate (hundreds of MHz). This high clock rate is 
made possible by the constrained environment of the bus. The bus 
lines are control led-inpedance, doubly-terminated lines. Por a 
data rate of 500 MHz, the maximum bus propagation time is less 
than 1 ns (the physical bus length is about 10 cm). In addition, 
because of the packaging used, the pitch of the pins can be very 
close to the pitch of the pads. The loading on the bus resulting 
from the individual devices is very small, m a preferred 
Implementation, this generally allows stub capacitances of 1-2 pF 
and inductances of 0.5 - 2 nH. Each device 15, 16, 17, shown in 
Figure 3, only has pins on one side and these pins connect 
directly to the bus 18. A transceiver device 19 can be included 
to interface multiple units to a higher order bus through 
pins 20. 

A primary result of the architecture of this Invention 
Is to increase the bandwidth of DRAM access. The Invention also 
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««1 increase, pacJOaj deneity end ,y.te„ reliability. 

Wig DBSCRIPTT911 or Tint TO,, 

Fljure 1 l4 a diaar.» ,*ich Illustrate, the basic 2-D 
organisation of Denorydevice, . 

Figure 2 /a ,che»atic block di.graM which Illustrate, 
the parallel ccm.ection of all bus lines and the eeri.l Reset 
line to each device in the ,y,tem. 

Figure perspective view of a .ysten of the 

invention which illustrates the 3-D packaging of .e,^co„ductor 
devices on the prWry bus. 

Figure /showihe format of a request packet. 

Figure 5 sho4 the format of a retry response from a 
«lave. / 

/ / 

Figure 6/showythe bus cycle, after a request packet 
colli.^„ occurs<o„ t*/bus and how arbitration is handled. 

«:gtt5fe-^^-4rfww8-tiie timing whereby signals from two 



devices can overlap temporarily and drive the bus at the same 
time. / / 

^ ^^Wgure-e-sl^ the connection and timing between bus 
clocks and devices o/ the bus. 



Figure 9 Is a perspective view showing how transceivers 
can be used to connect a number of bus units to a transceiver 
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Figure 10 1. a block and acheaatic diagram of 
Input/output circuitry „ed to connect devices to the bus. 

Figure 11 Is a schematic diagram of a clocked sense- 
amplifier used as a bus input receiver. 

Figure 12 is a block diagram showing how the internal 
device clock is generated from two bus clock signals using a set 
of adjustable delay lines. 

Figure 13 is a timing diagram showing the relationship 
of signals in the block diagram of Figure 12. 

Figure 14 is timing diagram of a preferred means of 
implementing the reset procedure of this invention. 

Figure 15 is a diagram illustrating the general 
^organiration of a 4 Mbit DRAM divided into 8 subarrays. 

JETAILgn OESCRTPTTnw 

The present invention is designed to provide a high 
speed, multiplexed bus for communication between processing 
devices and memory devices and to provide devices adapted for use 
in the bus system. The Invention can also be used to connect 
processing devices and other devices, such as I/O interfaces or 
disk controllers, with or without memory devices on the bus. The 
bus consists of a relatively small number of lines connected In 
parallel to each device on the bus. The bus carries 
substantially all address, data and control information needed by 
devices for communication with other devices on the bus. In many 
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systems nsing the present invention, the bus carries almost every 
signal between every device in the entire system. Ihere is no 
need for separate device-select lines since device-select 
information for each device on the bus is carried over the bus. 
5 There is no need for separate address and data lines because 
address and data information can be sent over the same lines. 
Using the organization described herein, very large addresses (40 
• bits in the preferred implementation) and large data blocks (1024 
0 bytes) can be sent over a small number of bus lines (8 plus one 
10 i=k control line in the preferred implementation). 

Virtually all of the signals needed by a computer" 
% system can be sent over the bus. Persons skilled in the art 

recognise that certain devices, such as CPUs, may be connected to 
:^ other signal lines and possibly to independent buses, for example 
15 g a bus to an independent cache memory, in addition to the bus of 
i| this invention. Certain devices, for example cross-point 

switches, could be connected to multiple, independent buses of 
this invention. In the preferred implementation, memory devices 
are provided that have no connections other than the bus 
20 connections described herein and CPUs are provided that use the 
bus of this invention as the principal, if not exclusive, 
connection to memory and to other devices on the bus. 

All modem DRAM, SRAM and ROM designs have internal 
architectures with row (word) and column (bit) lines to 
25 efficiently tile a 2-D area. Referring to Pig. 1, one bit of 
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data i. .tor«I at the intersection of each vord line 5 and bit 
line 6. When a particular vord line i« enabled, all of the 
corresponding data bits are transferred onto the bit line.. This 
data, about 4000 bits at a ti«e in a 4 MBit DRAM, i. then loaded 
into column sense amplifiers 3 and held for use by the I/O 
circuits . 

In the invention presented here, the data from the 
•ense amplifiers is enabled 32 bits at a time onto an internal 
device bus running at approximately 125 MHz. This internal 
device bus moves the data to the periphery of the devices ,*ere 
the data is multiplexed into an 8-bit wide external bus 
interface, running at approximately 500 MHz. 

The bus architecture of this invention connects master 
or bus controller devices, such as CPUs, Direct Memory Access 
devices (DMAS) or Floating Point Units (FPUs), and slave devices, 
•uch as DRAM, SRAM or ROM memory devices. A slave device 
responds to control signals; a master sends control signals. 
Persons skilled in the art realize that some devices may behave 
as both master and slave at various times, depending on the mode 
of operation and the state of the system. For example, a memory 
device will typically have only slave functions, %Aile a DMA 
controller, disk controller or CPU may include both slave and 
naster functions. Many other semiconductor devices, including 
I/O devices, disk controllers, or other special purpose devices 
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•uch as high speed switches can be modified for use with the bus 
of this invention. 

J^^^^ ^^*^»^conductor device contains a set of internal 
P register^, p«.ferably including a device identification (device 
^ * m "'^^^ * device-type descriptor reglst^control 
p ' registers^ and other registers containing other Information 

relevant to that type of device. In a preferred implementation, 
() temiconductor devices connected to the bus contain reglstersj^ 

I ^^""^ «peclfy the memory i^dresses contained within that device 
OlO P and access-time reglstei/^lch store a set of one or more delay 

I times at which the device can or should be available to send or 

5 receive data. 

,~ Most of these registers can be modified and preferably 

g are set as part of an initialization sequence that occurs when 
15 I the system is powered up or reset. During the initialization 
I sequence each device on the bus is assigned a unique device ID 
f) number, which is stored in the device ID registe^?'' A bus master 
can then use these device ID numbers to access and set 
appropriate registers in other devices, including access-time 
(!)20 register^, control reglsteri^^and memory register^-^to configure 

the "ystem.^ Each slave may have one or several access-time 
(j) registers^ (four in a preferred embodiment). In a preferred 

embodiment, one access-time register in each slave is permanently 
or seml-permanently programmed with a fixed value to facilitate 
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cextAln control functions. A preferred Implementation of an 
InitiallEatlon sequence is described below in aiore detail. 

All information sent between master devices and slave 
devices is stent over the external bus, which, for example, may be 
8 bits wide. This is accomplished by defining a protocol %diereby 
a master device, such as a microprocessor, seizes exclusive 
control of the external bus (i.e., becomes the bus master) and 
initiates a bus transaction by sending a request packet (a 
sequence of bytes comprising address and control information) to 
one or more slave devices on the bus. An address can consist of 
16 to 40 or more bits according to the teachings of this 
invention. Each slave on the bus must decode the request packet 
to see if that slave needs to respond to the packet. The slave 
that the packet is directed to must then begin any internal 
processes needed to carry out the requested bus transaction at 
the requested time. The requesting master may also need to 
transact certain internal processes before the bus transaction 
begins. After a specified access time the slave(s) respond by 
returning one or more bytes (8 bits) of data or by storing 
information made available from the bus. More than one access 
time can be provided to allow different types of responses to 
occur at different times. 

A request packet and the corresponding bus access are 
separated by a selected number of bus cycles, allowing the bus to 
be used in the intervening bus cycles by the same or other 
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-aster, for additional request, or brief bus acce..e.. Thus 
multiple, independent accesses are permitted, allowing naxiaum 
utlliratlon of the bus for transfer of .hort block, of data. 
Transfer, of long block, of data u.e the bu. efficiently even 
Without overlap because the overhead due to bus address, control 
and access tiaes is small compared to the total time to request 
and transfer the block. 

Device Address Mapping 

Another unique aspect of this invention is that each 
memory device is a complete, independent memory subsystem with 
all the functionality of a prior art memory board In a 
conventional backplane-bus computer system. Individual memory 
devices may contain a single memory section or may be subdivided 
into more than one discrete memory section. Memory devices 
preferably include memory address registers for each discrete 
memory section. A failed memory device (or even a subsection of 
a device) can be -mapped out- with only the loss of a small 
fraction of the memory, maintaining essentially full .ystem 
capability. Mapping out bad devices can be accomplished in two 
vays, both compatible with this invention. 

The preferred method uses address registers In each 
memory device (or Independent discrete portion thereof) to store 
Information which defines the range of bus addresses to which 
this, memory device will respond. This Is similar to prior art 
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schemes used In nemory boartls in conventional backplane bus 
•y«t««i. n»e address registers can Include a single pointer, 
usually pointing to a block of known sise, a pointer and a fixed 
or variable block site value or two pointers, one pointing to the 
beginning and one to the end (or to the -top- and -bottom-) of 
each aeaory block. By appropriate settings of the address 
registers, a series of functional memory devices or discrete 
memory sections can be made to respond to a contiguous range of 
addresses, giving the system access to a contiguous block of good 
memory, lifted primarily by the number of good devices connected 
to the bus. A block of memory in a first memory device or memory 
section can be assigned a certain range of addresses, then a 
block of memory in a next memory device or memory section can be 
assigned addresses starting with an address one higher (or lower, 
depending on the memory structure) than the last address of the 
previous block. 

Preferred devices for use in this invention include 
device-type register information specifying the type of chip. 
Including how much memory is available in ^t configuration on 
that device. A master can perform an appropriate memory test, 
such as reading and writing each memory cell in one or more 
•elected orders, to test proper functioning of each accessible 
discrete portion of memory (based in part on inforaation like 
device ID number and device-type) and write address values (up to 
40 bits in the preferred embodiment, 10» bytes), preferably 
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contiguous. Into device addrees-space regl.ters. Ikm-functional 
or lapaired memory section, can be assigned a special address 
value %»hich the system can interpret to avoid using that «»mory. 

The second approach puts the burden of avoiding the bad 
devices on the system master or masters. CPUs and DMA 
controllers typically have some sort of translation look-aside 
buffers (TLBS) which map virtual to physical (bus) addresses. 
With relatively simple software, the TLBs can be programmed to 
use only working memory (data structures describing functional 
memories are easily generated). For masters which don't contain 
TLBs (for exaii5>le, a video display generator), a small, simple 
RAM can be used to map a contiguous range of addresses onto the 
addresses of the functional memory devices. 

Either scheme works and permits a system to have a 
significant percentage of non-functional devices and still 
continue to operate with the memory which remains. This means 
that systems built with this invention will have much Improved 
reliability over existing systems. Including the ability to build 
systems with almost no field failures. 



Bus 



The preferred bus architecture of this invention 
comprises 11 signalst Bu8Data[0i7) ; AddrValld; Clkl and Clk2; 
plus an input reference level and power and ground lines 
connected in parallel to each device. Signals are driven onto 
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the bus during conventional bus cycles. The notation 
•Slgnal[lij]- refers to a specific range of signals or lines, for 
exanple, Bu8Data[0t7] means BusDataO, BusDatal, . . BusData?. 
The bus lines for BusData[0i7] signals form a byte-wide, 
aultlplexed data/address/control bus. AddrValld Is used to 
Indicate yahen the bus Is holding a valid address request, and 
Instructs a slave to decode the bus data as an address and. If 
the address is Included on that slave, to handle the pending 
request. The two clocks together provide a synchronised, high 
speed clock for all the devices on the bus. In addition to the 
bused signals, there is one other line (Resetin, ResetOut) 
connecting each device in series for use during Initialization to 
assign every device in the system a \inique device ID number 
(described below in detail). 

To facilitate the extremely high data rate of this 
external bus relative to the gate delays of the internal logic, 
the bus cycles are grouped into pairs of even/odd cycles. Note 
that all devices connected to a bus should preferably use the 
same even/odd labeling of bus cycles and preferably should begin 
operations on even cycles. This is enforced by the clocking 
scheme. 

Protocol and Bus Operation 

The bus uses a relatively simple, synchronous, split- 
transaction, block-oriented protocol for bus transactions. One 
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Of the goal, of the .yatem 1. to keep the Intelligence 
concentrated In the «a.ter., thu. keeping the .lave. «. .i^ie „ 
possible («i„ce there are typically nany more .lave, than ^ 
«8ter.). To reduce the complexity of the .lave., a .lave .hould 
preferably respond to a request in a .pecified time, .ufficient 
to allo„ the slave to begin or possibly complete a device- 
internal phase including any internal action, that must precede 
the subsequent bus access phase. The time for this bus access 
phase i. known to all devices on the bus - each master being 
responsible for making sure that the bus will be free when the 
bus acce.. begin.. Thus the slave, never worry about arbitrating 
for the bus. This approach eliminate, arbitration in .ingle 
master .ystems, and also makes the slave-bu. interface .Impler. 

In a preferred implementation of the invention, to 
initiate a bus transfer over the bus, a master sends out a 
request packet, a contiguous series of bytes containing address 
and control information, it is preferable to use a request 
packet containing an even number of bytes and also preferable to 
.tart each packet on an even bu. cycle. 

The device-eelect function i. handled using the bus 
data lines. AddrValid is driven, which instruct, all .lave, to 
decode the request packet address, determine whether they contain 
the requested address, and if they do, provide the data back to 
the master (in the case of a read request) or accept data from 
the master (in the case of a write request) in a data block 
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tranfifor* A naster can also select a specific device by 
transmitting a device ID number in a request packet. In a 
preferred implementation, a special device ZD number is chosen to 
indicate that the packet should be interpreted by all devices on 
5 the bus. This allows a master to broadcast a message, for 

example to set a selected control register of all devices with 
the same value. 

The data block transfer occurs later at a time 
specified in the request packet control information, preferably 
10 ;!! beginning on an even cycle. A device begins a data block 

ij! transfer almost immediately with a device-internal phase as the 

device initiates certain functions , such as setting up memory 
; addressing, before the bus access phase begins. The time after 
U which a data block is driven onto the bus lines is selected from 
jpi5 n values stored in slave access-time registes^ The timing of data 
for reads and writes is preferably the same; the only difference 
is lAich device drives the bus. For reads, the slave drives the 
bus and the master latches the values from the bus. For writes 
the master drives the bus and the selected slave latches the 
20 values from the bus. 

In a preferred Implementation of this invention shown 
in Figure 4, a request packet 22 contains 6 bytes of data — 4.5 
address bytes and 1.5 control bytes. Each request packet uses 
all nine bits of the multiplexed data/address lines (AddrValid 23 
25 ^ BusData[0:7] 24) for all six bytes of the request packet. 
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settle 23 *,d^.Xi, . , ^ „ ^^^^ ^ 

indXcf. th, .t.« „f „ ^^^^ ^^^^^ l.fo».tlo., 

la . ralid r,^e.t packet, *d<Wr.Ud 27 »«.t b. 0 In the l..t 
byt.. A„ertl«, m. .ignel i„ the l„t h^e Invalldete. the 
request packet, 1. „.e<, for the collision detection «.d 

.rbitration loaic (described below,. Bytes 25-26 contain the 
first 35 address bits. Addre.sI0.35,. The last byte contain. 
Addrvalid 27 (the invalidation ..itch, and 28, the re,»i„i„, 
sddre., bits, Addre.sI3«.39J, ^ BlockSisetO.3, (control 
information). 

The first byte contains two 4 bit fields containin, 
control infor-ation, Access:,Vpet0.3,, .„ op code (operation code, 
"hich, for exsMple, specifies the tn« of access, and 
"••terio.s,, a position reserved for the sister .endin, the 
P«ket to include its .»ster ID number. Only ™.ter nu-^r. 1 
through IS are allowed - ..aster nu»ber 0 is reserved for .pecial 
.yste« co-ands. Any packet with HastertO.3] - 0 is an invalid 
or .pecial packet and i. treated accordinsly. 

The Acce..Typ. field .pecif ies whether the requested 
operation is . read or write ^ the type of access, for .,.»ple. 
Whether It is to the control real.ters or other parts of the 
device, such a. »amory. m a preferred implementation. 
Access^fOJ is a Read/Write switch, if it i. . a, then the 
operation call, for a read from the .lave (the .lav. to read th. 
requested memory block and drive the memory contents onto the 
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«»»)( If It 1. « 0, th. op.r.tion call, for . «lt. late, th. 
•lav (th. .lave to read data fro» th. boa «,d writ, it to 

-emonr). A^^BBTSpatloJ provid.. up to 8 dlff.r«t ace. t^, 
for a Acc...Typ.ti.2j pr.f.rably Indlcat.. th. tlalna of 

5 th. respons.. ,*ich 1. ,tor.d in an «cc...-tim. r.,i.t.r, 

Ac«s.Reg«r. choic. of acc.s.-tl«e r.gist« can b. ..l.ct.d 

directly by having a certain op code .elact that r.gi,ter. or 
indirectly by having a .lave respond to .elected op code, with 
O pre-.elected acce,. ti».. (,.. tabl. below). Th. r«Mining bit, 
10 g Acc.ssivpe[3] .^y b. used to .end additional information about 
m the request to the « laves. 

O One special type of access Is control register access, 

~ ^ich involves addressing a selected register In a selected 
slave. In the preferred Implementation of this invention, 
15 I AccessType[l,3] equal to rero Indicates a control register 

request and the address field of the packet Indicates the desired 
control register. For example, the most significant two bytes 
can be the device ID number (specifying idxlch slave is being 
addressed) and the least significant three bytes can specify a 
register address and may also represent or include data to be 
loaded into that control register. Control register accesses are 
used to initialize the access-time registers, so it is preferable 
to use a fixed response time which can be preprogrammed or even 
hard wired, for example the value in AccessRegO, preferably 8 
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cycles. Control register access can also be used to Initialise 
or nodify other registers. Including address registers. 

The method of this invention provides for access mode 
control specifically for the DRAMs. One such access node 
determines %»hether the access is page mode or normal RAS access. 
In normal mode (in conventional DRAMS and in this invention), the 
DRAM column sense amps or latches have been precharged to a value 
intermediate between logical 0 and 1. This precharging allows 
access to a row in the RAM to begin as soon as the access request 
for either inputs (writes) or outputs (reads) is received and 
allows the column sense amps to sense data quickly. In page mode 
(both conventional and in this invention), the DRAM holds the 
data in the column sense amps or latches from the previous read 
or wcLte operation. If a subsequent request to access data is 
directed to the same row, the DRAM does not need to wait for the 
data to be sensed (it has been sensed already) and access time 
for this data is much shorter than the normal access time. Page 
mode generally allows much faster access to data but to a smaller 
block of data (equal to the number of sense amps). However, if 
the requested data is not in the selected row, the access time is 
longer than the normal access time, since the request must %rait 
for the RAM to precharge before the normal mode access can start. 
Two access-time registers in each DRAM preferably contain the 
access times to be used for normal and for page-mode accesses, 
respectively. 
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The access mode also determines idiether the DRAM should 
precharge the sense amplifiers or should save the contents of the 
sense amps for a subsequent page mode access. Typical settings 
are "precharge after normal access" and "save after page mode 
access" but "precharge after page mode access" or "save after 
normal access" are allowed, selectable modes of operation. The 
DRAM can also be set to precharge the sense amps if they are not 
accessed for a selected period of time. 

In page mode, the data stored in the DRAM sense 
amplifiers may be accessed within much less time than it takes to 
read out data in normal mode (-10-20 nS vs. 40-100 nS). This 
data may be kept available for long periods. However, if these 
sense amps (and hence bit lines) are not precharged after an 
access, a subsequent access to a different memory wrd (row) will 
suffer a precharge time penalty of about 40-100 nS because the 
sense amps must precharge before latching in a new value. 

The contents of the sense amps thus may be held and 
used as a cache, allowing faster, repetitive access to small 
blocks of data. DRAM-based page-mode caches have been attempted 
in the prior art using conventional DRAM organisations but they 
are not very effective because several chips are required per 
computer %rord. Such a conventional page-mode cache contains many 
bits (for example, 32 chips x 4Kbits) but has very few 
independent storage entries. In other wrds, at any given point 
in time the sense amps hold only a few different blocks or memory 
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-locales- (a .lagle block of 4K vord., l„ the exainple above). 
Slaulatlone have .hown that upwards of 100 blocks are required to 
achieve high hit rates (>90% of requests find the requested data 
already in cache memory) regardless of the sise of each block. 
5ee, for example, Anant Agarwal, et. al., -An Analytic Cache 
Model,- ACM Transactions on Computer Systems, Vol. 7(2), pp. 
184-215 (May 1989). 

The organization of memory in the present invention ^ 
I allows each DRAM to hold one or more (4 for 4MBit DRAMS) 
I separately- addressed and independent blocks of data. A personal 
I computer or workstation with 100 such DRAMs (i.e. 400 blocks or 
I locales) can achieve extremely high, very repeatable hit rates 
^ (98-99% on average) as compared to the lower (50-80%), widely 
^ varying hit rates using DRAMS organized in the conventional 
I fashion. Further, because of the time penalty associated with 
I the deferred precharge on a -miss- of the page-mode cache, the 

conventional DRAM-based page-mode cache generally has been found 

to work less well than no cache at all. 
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For DRAM slave access, the access types are preferably 

used In the fallowing %»ayi ^ , > -^^ . . - 

^pa^ AeeesBTvperitai ^Ss& ' " AccessTjji? 

^0 Control Register Fixed, 8[Acce8sRegO] 

5 Access 

1 Unused Fixed, 8[Acce88RegO] 

2-3 ' ' Unused AccessRegl 

4.5 Page Mode DRAM Acce88Reg2 

access 

m 6-7 Normal DRAM access AccessReg3 

15 ;p 

^ ^^Persons skilled in the art will recognise that a series of 

||available bits could be designated as switches for controlling 
1"^ these access modes. For example: 

20 ^ AcceB8Type[2] « page mode/normal switch 

Acce86Type[3] « precharge/save-data switch 

m BlockSize[0t3] specifies the size of the data block 

transfer. If BlockSi2e(0] is 0, the remaining bits are the 

25 binary representation of the block site (0-7). If BlockSizelO] 
is 1, then the remaining bits give the block sise as a binary 
power of 2, from 8 to 1Q24. A sero-length block can be 
interpreted as a special command, for example, to refresh a DRAM 
without returning any data, or to change the DRAM from page mode 

30 to normal access mode or vice-versa. 
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Number fii Bvtee £i2fi]L 

0-7 respectively 
8 

16 
32 
64 
128 
256 
512 
1024 

Persons skilled in the art will recognize that other block size 
'^'encoding schemes or values can be used. 

15 H In most cases, a slave will respond at the selected 

in access time by reading or writing data from or to the bus over 
□bus lines BusData[0:7] and AddrValid will be at logical 0. In a 
r preferred embodiment, substantially each memory access will 
{=t involve only a single memory device, that is, a single block will 

20 Ipbe read from or %#ritten to a single memory device. 

Retry Format 

In some cases, a slave may not be able to respond 
correctly to a request, e.g., for a read or %n:ite. In such a 

25 situation, the slave should return an error message, sometimes 
called a H(o)ACK(nowledge) or retry message. The retry message 
can include information about the condition requiring a retry, 
but this increases system requirements for circuitry in both 
slave and masters. A simple message indicating only that an 

30 error has occurred allows for a less complex slave, and the 
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saster can take whatever action la needed to understand and 



correct the cause of the error. 



For exainple, under certain conditions a slave might not 



be able to supply the requested data. During a page-mode access, 
5 the DRAM selected must be in page mode and the requested address 
must match the address of the data held in the sense amps or 
latches. Each DRAM can check for this match during a page-mode 
access. If no match is found, the DRAM begins precharging and 
inretxims a retry message to the Blaster during the first cycle of 
10 igthe data block (the rest of the returned block is ignored). The 
jymaster then must wait for the precharge time (which is set to 
Iz^accommodate the type of slave in question, stored in a special 

[register, PreChargeReg ) , and then resend the request as a normal 

'. I" 

Ir^DRAM access (AccessType « 6 or 7). 

15 □ In the preferred form of the present invention, a slave 

inelgnals a retry by driving AddrValid true at the time the slave 
was supposed to begin reading or %9rlting data. A master which 
expected to %^ite to that slave must monitor AddrValid during the 
write and take corrective action if it detects a retry message. 

20 Figure 5 illustrates the format of a retry message 28 \^ch is 
useful for read requests, consisting of 23 AddrValid*l with 
llaster[0t3] * 0 in the first (even) cycle. Note that AddrValid 
is normally 0 for data block transfers and that there is no 
master 0 (only 1 through 15 are allowed). All DRAMs and. masters 

25 can easily recognize such a packet as an invalid request packet. 
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and therefore a retry measage. In this type of bus transaction 
all of the fields except for Master[0t3] and AddrValid 23 aay be 
used as information fields, although in the in^lementation 
described, the contents are undefined. Persons skilled in the 
5 art recognize that another method of signifying a retry message 
is to add a Datalnvalid line and signal to the bus. This signal 
could be asserted in the case of a MACK. 

I Bus Arbitration 

10 jn iTi the case of a single master, there are by definition 

no arbitration problems • The master sends request packets and 
keeps track of periods t^en the bus will he busy in response to 
that packet. The master can schedule multiple requests so that 
j!^ the corresponding data block transfers do not overlap. 

15 P The bus architecture of this invention is also useful 

IS in configurations with multiple masters. When two or more 
masters are on the same bus, each master must keep track of all 
the pending transactions, so each master Icnows when it can send a 
request packet and access the corresponding data block transfer. 

20 Situations will arise, however, where two or more masters send a 
request packet at about the same time and the multiple requests 
must be detected, then sorted out by some sort of bus 
arbitration. 

There are many ways for each master to keep track of 
25 when the bus is and will be busy. A simple method is for each 

High Performance Bus Interface -30- 



m m 



naster to maintain a bus-busy data structure, for Mcasqple by 
maintaining two pointers, one to Indicate the earliest point in 
the future vhen the bus will be busy and the other to Indicate 
the eeurllest point In the future when the bus will be free, that 
Is, the end of the latest pending data block transfer. Using 
this Information, each master can determine whether and yfhen 
there Is enough time to send a request packet (as described above 
under Protocol) before the bus becomes busy with another data 
block transfer and whether the corresponding data block transfer 
will Interfere with pending bus transactions. Thus each master 
must read every request packet and update Its bus-busy data 
structure to maintain Information about when the bus Is and will 
be free. 

With two or more masters on the bus, masters will 
occasionally transmit Independent request packets during the same 
bus cycle. Those multiple requests will collide as each such 
master drives the bus simultaneously with different Information, 
resulting In scrambled request Information and neither desired 
data block transfer. In a preferred form of the Invention, each 
device on the bus seeking to write a logical 1 on a BusData or 
AddrValld line drives that line with a current sufficient to 
sustain a voltage greater than or equal to the high-logic value 
for the system. Devices do not drive lines that should have a 
logical 0; those lines are simply held at a voltage corresponding 
to a' low-loglc value. Each master tests the voltage on at least 
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some, preferably all, bus data and the AddrValld lines so the 
■aster can detect a logical '1' ^ere the expected lewl Is '0' 
on a line that it does not drive during a given bus cycle but 
another master does drive. 
5 Another way to detect collisions is to select one or 

more bus lines for collision signalling. Each master sending a 
request drives that line or lines and monitors the selected lines 
for more than the normal drive current (or a logical value of 
fy">l"), indicating requests by more them one master. Persons 
10 ^skilled in the art will recognize that this can be implemented 
liywith a protocol involving BusData and AddxValid lines or could be 
J^implemented using an additional bus line« 

L In the preferred form of this invention, each master 

C.detects collisions by monitoring lines \^ich it does not drive to 
15 '^^Bee if another master is driving those lines. Referring to Fig. 
04, the first byte of the request packet includes the number of 
each master attempting to use the bus (Master£0t3] ) . If two 
masters send packet requests starting at the same point in time, 
the master numbers will be logical *or"ed together by at least 
20 those masters, and thus one or both of the masters, by monitoring 
the data on the bus and comparing %^at it sent, can detect a 
collision. For instance if requests by masters number 2 (0010) 
^ and 5 (0101) collide, the bus will be driven with the value 

IIa8ter[0s3]*7 (0010 * 0101 « 0111). Master number 5 will detect 
25 that the signal Master[2] - 1 and master 2 will detect that 
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Master! 1] and Ka8ter[3] - 1, telling both mastere that a 
collision has occurred. Another example is siasters 2 and 11, for 
which the bus will be driven with the value Master[0t3]*ll (0010 
♦ 1011 • 1011), and although master 11 can't readily detect this 
5 collision, master 2 can. When any collision is detected, each 
master detecting a collision drives the value o£ AddrValid 27 in 
byte 5 of the request packet 22 to 1, %diich is detected by all 
Q masters, including master 11 in the second example above, and 
forces a bus arbitration cycle, described below. 

10 :!f Another colliBion condition may arise where master A 

sends a request packet in cycle 0 and master B tries to send a 
w request packet starting in cycle 2 of the first request packet, 
thereby overlapping the first request packet* This will occur 
M from time to time because the bus operates at high speeds, thus 

15 -m the logic in a second-initiating master may not be fast enough to 
detect a request initiated by a first master in cycle 0 and to 
react fast enough by delaying its own request. Master B 
eventually notices that it wasn't supposed to try to send a 
request packet (and consequently almost surely destroyed the 

20 address that master A was trying to send), and, as in the example 
above of a simultaneous collision, drives a 1 on AddrValid dtiring 
byte 5 of the first request packet 27 forcing an arbitration. 
The logic in the preferred implementation is fast enough that a 
master should detect a request packet by another master by cycle 
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3 of the first request packet, so no master Is likely to attes^t 
to send a potentially colliding request packet later than 
cycle 2* 

Slave device^ not need to detect a collision directly, 
but they must wait to do anything irrecoverable until the last 
byte (byte 5) is read to ensure that the packet is valid. A 
request packet with Master[0s3] equal to 0 (a retry signal) is 
ignored and does not cause a collision. The subsequent bytes of 
isuch a packet are ignored* 

To begin arbitration after a collision, the masters 
mvait a preselected number of cycles after the aborted request 
q packet (4 cycles in a preferred implementation), then use the 
Lnext free cycle to arbitrate for the bus (the next available even 
cycle in the preferred implementation). Each colliding master 
15 signals to all other colliding masters that it seeks to send a 
K request packet, a priority is assigned to each of the colliding 
masters, then each master is allowed to make its request in the 
order of that priority. 

Figiire 6 illustrates one preferred way of implementing 
20 this arbitration. Each colliding master signals its intent to 
send a request packet by driving a single BusData line during a 
single bus cycle corresponding to its assigned master number (1- 
15 in the present example). During two-byte arbitration cycle 
29, byte 0 is allocated to requests 1-7 from masters 1-7^ 
25 respectively, (bit 0 is not used) and byte 1 is allocated to 
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requests 8-15 from nastere 8-15, respectively. At least one 
device and preferably each colliding master reads the values on 
the bus during the arbitration cycles to determine and store 
vhich masters desire to use the bus. Persons skilled in the art 
will recognise that a single byte can be allocated for 
arbitration requests if the system includes more bus lines than 
masters. More than 15 masters can be accommodated by using 
additional bus cycles. 

A fixed priority scheme (preferably using the master 
numbers, selecting lowest numbers first) is then used to 
prioritize, then sequence the requests in a bus arbitration queue 
^ich is maintained by at least one device. Theise requests are 
queued by each master in the bus-busy data structure and no 
further requests are allowed until the bus arbitration queue is 
cleared. Persons skilled in the art will recognise that other 
priority schemes can be used, including assigning priority 
according to the physical location of each master. 

System Configuration/Reset 

Zn the bus -based system of this invention, a mechanism 
is provided to give each device on the bus a unique device 
identifier (device ID) after power-up or tinder other conditions 
as desired or needed by the system. A master can then use this 

device ID to access a specific device, particularly to set or 

/70 

laodify registerSiOf the specified device, including the control 

A 
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and address registers. In the preferred embodiment, one master 
is assigned to carry out the entire system configuration process. 
Ihe master provides a series of uniqpie device ID numbers for each 
unique device connected to the bus system. In the preferred 
embodiment, each device connected to the bus contains a special 
device-type register which specifies the type of device, for 
instance CPU, 4 MBit memory, 64 MBit memory or disk controller. 
The configuration Blaster should check each device, determine the 
device type and set appropriate control registers, including 
access-time register^^ The configuration master should check 
each memory device and set all appropriate memory address 

registers^. 

A 

One means to set up tinique device ID numbers is to have 

each device to select a device ID in sequence and store the value 

/7/ 

in an internal device ID registe^. For example, a master can 
pass sequential device ID nximbers through shift registers in each 
of a series of devices, or pass a token from device to device 
whereby the device with the token reads in device ID information 
from another line or lines. In a preferred embodiment, device ID 
numbers are assigned to devices according to their physical 
relationship, for instance, their order along the bus. 

In a preferred embodiment of this invention, the device 
ID setting is accomplished using a pair of pins on each device, 
Kesetin and ResetOut. These pins handle normal logic signals and 
are ^tised only during device ID configuration. On each rising 
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•dge of the clock, each device copies Reset In (an Input) into a 
four-stage reset shift register. The output of the reset shift 
register is connected to ResetOut, which in turn connects to 
Resetin for the next sequentially connected device. 
Substantially all devices on the bus are thereby daisy-chained 
together. A first reset signal, for exan^le, while Resetin at a 
device is a logical 1, or when a selected bit of the reset shift 
_ register goes from tero to non-zero, causes the device to hard 
Preset, for example by clearing all internal registers and 
resetting all state machines. A second reset signal, for 
m example, the falling edge of Resetin combined with changeable 

.! X 

Rvalues on the external bus, causes that device to latch the 

contents of the external bus into the internal device ID register 
hj (DevlcelO:?] ) . 

% To reset all devices on a bus, a master sets the 

Resetin line of the first device to a -1" for long enough to 
ensure that all devices on the bus have been reset (4 cycles 
times the number of devices — note that the maximtim number of 
devices on the preferred bus configuration is 256 (B bit^), so 
that 1024 cycles is always enough time to reset all devices.) 
Then Resetin is dropped to •O* and the BusData lines are driven 
with the first followed by successive device ID numbers, changing 
after every 4 clock pulses. Successive devices set those device 
ID numbers into the corresponding device ID register as the 
falling edge of Resetin propagates through the shift registers of 
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the daisy-chained devices. Figure 14 shows Resetln at a first 
device going low t^le a master drives a first device ZD onto the 
bus data lines BusData[0i3]. The first device then latches in 
that first device ID. After four clock cycles, the master 
changes Bu8Data[0t3] to the next device ID number and ResetOut at 
the first device goes low, which pulls Resetln for the next 
daisy-chained device low, allowing the next device to latch in 
the next device ID nximber from Bu8Data[0t3] . In the preferred 
embodiment, one master is assigned device ID 0 and it is the 
responsibility of that master to control the Resetln line and to 
drive successive device ID numbers onto the bus at the 
appropriate times. In the preferred embodiment, each device 
waits two clock cycles after Resetln goes low before latching in 
a device ID number from BusData[0:3] . 

Persons skilled in the art recognize that longer device 
ID numbers could be distributed to devices by having each device 
read in multiple bytes from the bus and latch the values into the 
device ID register. Persons skilled in the art also recognise 
that there are alternative %mys of getting device ID numbers to 
unique devices. For instance, a series of sequential numbers 
could be clocked along the Resetln line and at a certain time 
each device could be instructed to latch the current reset shift 
register value into the device ID register. 

The configuration master should choose and set an 
access time in each access-time register^in each slave to a 
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period sufficiently long to allow the slave to perform an actual, 
desired aemory access. For exastple, for a normal DRAM access, 
this time must be longer than the row address strobe (RA5) access 
time. If this condition is not met, the slave may not deliver 
the correct data. The value stored in a slave access-time 
register, is preferably one-half the number of bus cycles for 

A 

which the slave device should wait before using the bus in 
response to a request. Thus an access time value of '1' ttfould 
indicate that the slave should not access the bus until at least 
two cycles after the last byte of the request packet has been 
received. The value of AccessRegO is preferably fixed at 8 
(cycles) to facilitate access to control registers. 



than one master device. The reset or initialization sequence 
should also include a determination of i^ether there are multiple 
masters on the bus, and if so to assign unique master ID numbers 
to each. Persons skilled in the art will recognize that there 
are many ways of doing this. For instance, the master could poll 
each device to determine what type of device it is, for example, 
by reading a special register then, for each master device, %n:ite 
the next available master ID number into a special register. 



The bus architecture of this invention can include more 



BCC 



Error detection and correction ("ECC") methods well 



known in the art can be implemented in this system. ECC 
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Information typically is calculated for a block of data at the 
tixM that block of data is first %rritten into memory. The data 
block osually has an integral binary sise, e.g. 256 bits, and the 

ECC information uses significantly fewer bits. A potential 
5 problem arises in that each binary data block in prior art 

schemes typically is stored with the ECC bits appended, resulting 
in a block size that is not an integral binary power. 

In a preferred embodiment of this invention, ECC 
iO information is stored separately from the corresponding data, 
10 io which can then be stored in blocks having integral binary sise. 
jijECC Information and corresponding data can be stored, for 
□ example, in separate DRAM devices. Data can be read without ECC 
lousing a single request packet, but to vrite or read error- 
ir! corrected data requires two request packets, one for the data and 
15 5* second for the corresponding ECC information. ECC information 
N may not always be stored permanently and in some situations the 
ECC information may be available without sending a request packet 
or without a bus data block transfer. 

In a preferred embodiment, a standard data block sise 
20 can be selected for use with ECC, and the ECC method will 
determine the required number of bits of information in a 
corresponding ECC block. RAMs containing ECC information can be 
programmed to store an access time that is equal 1:0 1 (1) the 
access time of the normal RAM (containing data) plus the time to 
25 access a standard data block (for corrected data) minus the time 
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to sand a reqxiest packet (6 bytes (2) the access tine of a 
nozsal RAM aiinus the tine to access a standard BCC block siinus 
the tine to send a request packet* To read a data block and the 
corresponding ECC block, the master simply issues a request for 
the data immediately followed by a request for the ECC block. 
The ECC RAM will wait for the selected access time then drive its 
data onto the bus right after (in case (1) above)) the data RAM 
has finished driving out the data block. Persons skilled in the 
art will recognize that the access time described in case (2) 
above can be used to drive ECC data before the data is driven 
onto the bus lines and will recognize that %n:iting data can be 
done by analogy with the method described for a read. Persons 
skilled in the art will also recognize the adjustments that must 
be made in the bus-busy structure and the request packet 
arbitration methods of this invention in order to accommodate 
these paired ECC requests. 

Since this system is quite flexible, the system 
designer can choose the size of the data blocks and the number of 
ECC bits using the memory devices of this invention. Note that 
the data stream on the bus can be interpreted in various ways. 
For instance the sequence can be 2^ data bytes followed by 2* ECC 
bytes (or vice versa), or the sequence can be 2^ iterations of 6 
data bytes plus 1 ECC byte. Other information, such as 
information used by a directory-based cache coherence scheme, can 
also 'be managed this %ray* See, for example, Aneuit Agarwal, et 
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ml*, "8cal«Abl« Oirwetoxy 8eham«s for Cachtt Coiisist«Bcgr#* 15th 
^-i^JatmraMtlooal Syapoalvm on CoaputBr Jkrebifcturm, 9BM If 88# pp. 
^1180-389. Thoso slcillsd in th« art will recognls* altttrnatlTa 
— thodt of iaplcmenting SCC schemes that are %rithln the teachings 
5 of this invention. 

Low Power 3-D Packaging 

Another major advantage of this invention is that it 
;^ drastically reduces the memory system power consumption. Nearly 
10 ;^ all the power consumed by a prior art DRAM is dissipated in 

perfozaing row access. By using a single row access In a single 
2 RAM to supply all the bits for a block request (cc(iq>ared to a 
row-*access in each of multiple RAMs in conventional nemozy 
systems) the power per bit can be made very small. Since the 
15 0 power dissipated by memory devices using this invention is 

SS. 

I E ; : 

m significantly reduced, the devices potentially can be placed much 
closer together than with conventional designs. 

The bus architecture of this invention makes possible 
an innovative 3-D packaging technology. ^ using a narrow, 

20 multiplexed (time-shared) bus, the pin count for an arbitrarily 
large memory device can be kept quite small - on the order of 20 
pins. Moreover, this pin count can be kept constant from one 
generation of DRAM density to the next. The low power 
dissipation allows each package to be smaller, with narrower pin 

25 pitches (spacing between the ZC pins). With current surface 
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moant technology supporting pin pitchas as low m» 20 «ils, all 
of f-d«Tic« connactions can be i^lemantad on a singla adga of the 
■■moTX devica. Samieonductor die nsaful in this invention 
pxefezably have connections or pads along one edge of the die 
vhich can then be wired or otherwise connected to the package 
pins with wires having similar lengths. This geometry also 
allows for very short leads, preferably with an effective lead 
length of less than 4 am. Furthermore, this invention uses only 
i bused interconnections, i.e., each pad on each device is 
connected by the bus to the corresponding pad of each other 
device. 

The use of a low pin count and an edge-connected bus 
permits a simple 3-D package, %ihereby the devices are stacked and 
the bus is connected along a single edge of the stack. The fact 
that all of the signals are bused is important for the 
implementation of a simple 3-D structure. Without this, the 
complexity of the "backplane" would be too difficult to make cost 
effectively with cxirrent technology. The individual devices in a 
stack of the present invention can be packed quite tightly 
because of the low power dissipated by the entire memory system, 
permitting the devices to be stacked bumper-to-bumper or top to 
bottOBi. Conventional plastic-injection molded small outline (SO) 
packages can be used with a pitch of about 2.5 am <100 mils), but 
the ultimate limit %»ould be the device die thickness, which is 



High Performance Bus Interface 



-43- 



•bout an ordax of aagnitnde nuallar, 0.2-0.5 mm vmiag pnrrant 
«af«r tachnolegy. 

> - . 

Bos Bl«ctrical Description 

By using devices with vexy low power dissipation and 
close physical packing, the bus can be aiade quite short, ^^ch in 
turn allows for short propagation tines and high data rates. She 
bus of a preferred embodiment of the present invention consists 
of a set of resistor-texminated controlled inpedance transmission 
lines idiich can operate up to a data rate of 500 MHs (2 ns 
cycles). The characteristics of the transmission lines are 
strongly affected by the loading caused by the DRAMs (or other 
slaves) mounted on the bus. These devices add lunqped capacitance 
to the lines %0hich both lowers the impedance of the lines and 
decreases the transmission speed. In the loaded environment, the 
bus ia^dance is likely to be on the order of 25 ohms and the 
propagation velocity about c/4 (c ■ the speed of light) or 7.5 
cm/ns. To operate at a 2 ns data rate, the transit time on the 
bus should preferably be kept under 1 ns, to leave 1 ns for the 
setup and hold time of the input receivers (described below) plus 
clock skew. Thus the bus lines must be kept quite short, under 
about 8 cm for maximum performance. Lower performance systems 
may have much longer lines, e.g. a 4 ns bus may have 24 cm lines 
(3 ns transit time, 1 ns setup and hold tiae). _J=v. 
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In thm preferred enbodijMnt, the bus nses oaxrent 
^ouree drivers. Bach output aust be able to sink SO mk, which 
^rorides an output swing of about 500 or aore. Xn the 
preferred embodiment of this invention, the bus is active low* 
> The unasserted state (the high value) is preferably considered a 
logical sero, and the asserted value (low state) is therefore a 
logical 1. Those skilled in the art understand that the sethod 
of this invention can also be iatplemented using the opposite 
i logical relation to voltage. The value o£ the unasserted state 
) In is >®t by the voltage on the termination resistors, and should be 



I^Jhigh enough to allow the outputs to act as current sources, while 

being as low as possible to reduce power dissipation. These 
:\ constraints nay yield a termination voltage about 2V above jground 
in the preferred iinplementation. Current source drivers cause 
15 □ the output voltage to be proportional to the sum of the sources 
ifl driving the bus. ^ 

Referring ^^rTigvT^ although there is no stable 
condition %fhere two devices drive the bus at the same time, 
conditions can arise because of propagation delay on the ifires 
20 %^ere one device, A 41, can start driving its part of the bus 44 
while the bus is still being driven by another device, B 42 
(already asserting a logical 1 on the bus). In a system using 
oirrent drivers, when B 42 is driving the bus (beforo time 46), 
the value at points 44 and 45 is logical 1. If B 42 switches off 
25 at time 46 just %fhen X 41 switches on, the additional drive by 
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darica X 41 causae tha Toltaga at tha output 44 of Ji 41 to drop 
brlaflj balov tha noxaal valua. Tha voltage ratuxns to ita 
: aoxBal valna at tlna 47 whan tha affact of davica B 42 taming 
off la fait. Tha voltage at point 45 goes to logical 0 when 
5 device B 42 tuzna off, then drops at time 47 %dxen the affect of 
device A 41 turning on ia felt. Since the logical 1 driven toy 
current from device A 41 is propagated irrespective of the 
previous value on the bus, the value on the bus is guaranteed to 
settle after one time of flight (t^) delay, that is, the time it 
10 ;° takes a aignal to propagate from one end of the bus to the other. 
-■zl It a voltage drive was used (as in BCL wired-ORing), a logical 1 
2 on the bus (from device B 42 being previously driven) %rould 
f prevent the transition put out by device A 41 being felt at the 
M most remote part of the system, e.g., device 43, until the 
15 0 tumof f waveform from device B 42 reached device A 41 plus one 
ii time of flight delay, giving a %rorst case settling time of twice 
the time of flight delay. 

Clocking 

20 Clocking a high speed bus accurately without 

introducing error due to propagation delays can be implemented by 
having each device monitor t%ro bus clock signals and then derive 
internally a device clock, the true system clock. She bus clock 
information can be sent on one or /two lines to provide a 

25 mechanism for each bused device to generate an internal device 
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clock with sfTO skew Mlativ* to all the other AwrLom clocks. 
A^^^.^««£«rriag t^^fig^ 8, In tbo pre£«xrml lii>l«n«xt«tloo, « ham 

""clock ganerator 50 at one and of the bus propagatas an aarly bos 
clock signal in one direction along the bus, f or axasipla on line 
53'lgBaB^l^tjtlfco^ight, to the far and of the bus. The same clock 
signal then is passed through the direct connection shown to a 
aeoond line 54, and returns as a late bus clock signal Along the 
^ bus lrom the far and to the origin, propagating from^gnt to 
^ jy^^Sf^ A single bus clock line can be used if it is left 
10 unterminated at the far and of the buB, allowing the early bus 
!?1 clock signal to reflect back along the same line as a late bus 
0 clock signal. 

r Figure 8b illustrates how each device 51, 52 receives 

JP each of the two bus clock signals at a different time (because of 
15 s propagation delay along the %d.re8), %fith constant midpoint in 
lii time bet%raen the tw bus clocks along the bus. At each device 
51, 52, the rising edge 55 of Clockl 53 is followed by the rising 
edge 56 of Clock2 54. Similarly, the falling edge 57 of Clockl 
53 is followed by the falling edge 58 of Clock2 54. This 
20 wveform relationship Is observed at all other devices along the 
bus. Devices which are closer to the clock generator have a 
greater separation between Clockl and Clock2 relative to devices 
farther from the generator because of the longer time required 
for each clock pulse to traverse the bus and return along line 
25 54, but the midpoint in time 59, 60 between corresponding rising 
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or fAlilag •dget it tix^d hmcvoM; for any giywi d&wlo; thm 

■■.,':.i-iro .•■ 

l«B0th Of MCh dock lln« b«t«fMn the far mod of tli* hoM and that 
^^•▼lea is ogoal. Bach davica aruat aaa^la the two bos elocka and 
^anarata ita own intamal device clock at the midpoint of the 
two. 

Clock distribution problems can be further reduced by 
using a bus clock and device clock rate equal to the bus cycle 
data rate divided by two, that ia, the bus clock period is twice 
J the bus cycle period. Thus a 500 MHz bus preferably uses a 250 
=t KHz clock rate. This reduction in frequency provides two 
ji benefits* First it meJces all signals on the bus have the same 
j worst case data rates — data on a 500 UHz bus can only change 

every 2 ns* Second, clocking at half the bus cycle data rate 
^ Bakes the labeling of the odd and even bus cycles trivial, for 
example, by defining even cycles to be those vhen the internal 
^ device clock is 0 and odd cycles ^en the internal device clock 
is 1. 

Kultiple Buses 

The limitation on bus length described above restricts 
the total ntunber of devices that can be placed on a single bus. 
Using 2.5 mm spacing between devices, a single 8 cm bus will hold 
about 32 devices. Persons skilled in the art will recognise 
certain applications of the present invention wherein the overall 
data rate on the bus is adequate but memory or processing 
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raqnireinants McesBltattt a auch larger number of darlOM (aany 
aore than 32). Larger systeme can easily be built nslttg the 
tenrhtng* of this invention by using one or aore swisoi'j/ 
sobsysteBS, designated primary bus units, each of which consists 
of two or Bore devices, typically 32 or close to the aazisum 
allowed by bus design requirements, connected to a transceiver 
device. 

Referring to Figure 9, each priznaxy bus unit can be 
mounted on a single circuit board 66, sometimes called a memory 
stick. Each transceiver device 19 in turn connects to a 
transceiver bus 65, similar or identical in electrical and other 
respects to the primary bus 18 described at length above. In a 
preferred implementation, all masters are situated on the 
transceiver bus so there are no transceiver delays between 
masters and all memory devices are on primary bus units so that 
all memory accesses experience an equivalent transceiver delay, 
but persons skilled In the art will recognise how to inclement 
systems which have masters on more than one bus unit and memory 
devices on the transceiver bus as well as on primary bus units. 
In general, each teaching of this invention which refers to a 
memory device can be practiced using a transceiver device and one 
or more memory devices on an attached^ prfnmy bus uxiit. Other 
devices, generically referred to as peripheral devices, including 
disk controllers, video controllers or I/O devices can also be 
attached to either the transceiver bus or a primary bus unit, as 
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dMixad. Persons skilled in th« art will xmcoqnLmm'how to a 
•ingl* priaaxy bos unit or mltiplo prixaary bus vaits as aaadad 
with a txansceivar bus in cartain system designs. 

The transceivers are quite siople in function. Ihey 
detect request packets on the transceiver bus and transmit them 
to their primary bus unit. If the request packet calls for a 
vrite to a device on a transceiver's primary bus unit, that 
transceiver keeps track of the access time and block size and 
forwards all data from the transceiver bus to the primary bus 
unit during that time. The transceivers also watch their primary 
bus unit, f orrarding any data that occurs there to the 
transceiver bus. The high speed of the buses means that the 
transceivers will need to be pipelined, and will require an 
additional one or tvo cycle delay for data to pass through the 
transceiver in either direction. Access times stored in masters 
on the transceiver bus must be increased to account for 
transceiver delay but access times stored in slaves on a primary 
bus unit should not be modified. 

Persons skilled in the art will recognise that a more 
sophisticated transceiver can control transmissions to and from 
primary bus units. An additional control line, TmcvrRW can be 
bused to all devices on the transceiver bus, using that line in 
conjunction with the AddrValid line to indicate to mil devices on 
the transceiver bus that the information on the data lines is» 1) 
a request packet, 2) valid data to a slave, 3) valid data from a 
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•lftT«, or 4) invalid data (oz idl« bos). Qsing this vctza 
control lino obviatos tho nood for tho transceivors to koop track 
>«f nban data noeds to bo forwarded from its primary bos to the 
transceiver bus - all transceivers send all data f roan their 
prinazy bns to the transceiver bus idienever the control signal 
indicates condition 2) above. In a preferred iinplementation of 
this invention, if AddzValid and TmcvrRH are both low, there is 
no bus activity and the transceivers should remain in an idle 
state. A controller sending a request packet %d.ll drive 
AddzValid high, indicating to all devices on the transceiver bus 
that a request packet is being sent %fhich each transceiver should 
fozward to its pzimazy bus unit. Bach controller seelcing to 
write to a slave should drive both AddxValid and TmcvrRW high, 
indicating valid data for a slave is present on the data lines. 
Each transceiver device will then transmit all data from the 
transceiver bus lines to each pzimazy bus unit. Any contzollez 
expecting to zeceive inf ozmation from a slave should also drive 
the TmcvrRW line high, but not drive AddzValid, thereby 
indicating to each transceiver to transmit any data coming from 
any slave on its primary local bus to the transceiver bus. A 
still more sophisticated transceiver would recognise signals 
addressed to or coming from its primary bus unit and transmit 
signals only at requested times. 

An example of the physical mounting' of the transceivers 
is shown in Figure 9. One important feature of this physical 
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Arrangaiasnt Is to Integrate the bus of each transoei'wr 19 with 
the original bus of ORAMs or other devices 15, 16, 17 cn the 
priaaxy bus unit €S. Ihe transceivers 19 have pins on two sides, 
and are preferably noiuited flat on the primary bus unit with a 
first set of pins connected to primary bus 18. A second set of 
transceiver pins 20, preferably orthogonal to the first set of 
pins, are oriented to allow the transceiver 19 to be attached to 
the transceiver bus 65 in much the same way as the DRAMs %rere 
attached to the primary bus unit. The transceiver bus can be 
generally planar and in a different plane, preferably orthogonal 
to the plane of each primary bus unit. The transceiver bus can 
also be generally circular with primary bus units mounted 
perpendicular and tangential to the transceiver bus. 

Using this two level scheme alloura one to easily build 
a system that contains over 500 slaves (16 buses of 32 DRAMs 
each). Persons skilled in the art can modify the device ZD 
scheme described above to accommodate more than 256 devices, for 
example by using a longer device ID or by using additional 
registers to hold some of the device ZD. This scheme can be 
extended in yet a third dimension to make a second-order 
transceiver bus, connecting multiple transceiver buses by 
aligning transceiver bus units parallel to and on top of each 
other and busing corresponding signal lines through a suitable 
transceiver. Using such a second-order transceiver bus, one 
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eoold coxm^ct many thousands of Slavs devlcss into illiat Is 
^4^«£f setlTsly a single bos . 

pevlce Interface 
5 The device interface to the high-speed bus can be 

divided into three nain parts. The first part is the electrical 
interface. This part includes the input receivers, bus drivers 
and clock generation circuitry. The second part contains the 
'^address comparison circuitry and timing registers. This part 
10 ;^ takes the input request packet and determines if the request is 
^! f or this device, and if it is, starts the internal access and 
2 delivers the data to the pins at the correct time. The final 
I part, specifically for memoiry devices such as DRAMs, is the DRAM 
U column access path. This part needs to provide bandwidth Into 
15 Q and out of the DRAM sense amps greater than the bandwidth 
m provided by conventional DRAMs . The implementation of the 

electrical interface and DRAM column access path are described in 
more detail in the following sections. Persons skilled in the 
art recognise how to modify prior^art address con^rison 
20 circuitry and prior^art register circuitry in order to practice 
the present invention. 

Electrical Interface Input/Output Circuitry 

A block diagram of the preferred input/output .circuit 
25 for address /data/control lines is shown in Figxxre 10. This 
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clzcoltzy is particularly wall-suited for osa in UtaM davioas but 
^ ®' Bodif iad by ona slcillad in tha art iTor nsa in 

v-^otiiar davicas connected to the bus of this invention* Tt 

consists of a set of iaput rscelvors 71, 72 and output drlTer 76 * 
S connactad to input/output lina 69 and pad 75 and eireuitxy to osa 
the intaxnal clock 73 and internal clock complement 74 to drive 
the input interface. The clocked input receivers take advantage 
of the synchronous nature of the bus. To further reduce the 
performance requirements for device input receivers, aach device 
10 2 P^r And thus each bus line, is connected to t%fo clocked 
1^: receivers, one to sample the even cycle inputs, the other to 
sample the odd cycle inputs. By thus de-multiplexing the input 



at the pin, each clocked an^lif ier is given a full 2 ns cycle 
to amplify the bus low-voltage-s%dJig signal into a full value 

15 p CMOS logic signal. Persons skilled in the art will recognise 

. ■ 

0 additional clocked input receivers can be used trithin the 

teachings of this invention. For exan^le, four input receivers 
could be connected to each device pin and clocked by a modified 
internal device clock to transfer sequential bits from the bus to 

20 internal device circuits, allowing still higher external bus 
speeds or still longer settling times to amplify the bus low- 
voltage-sving signal into a full value CMOS logic signal. 

The output drivers are quite sisqple, and consist of a 
single MHOS pulldown transistor 76. This transistor la aised so 

25 that under worst case conditions it can still sink the 50 wA 
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ragairsd by thm bus. For 0.8 alcxon CMOS technology, th* 
transistor vlll n—d to bo abont 200 alcrons long. Ovorall bus 

- > psrfozaanco can be is^rovod by using feedback techniques to 

control output transistor current so that the current through the 
5 device is roughly 50 bA under all operating conditions, although 
this is not absolutely necessary for proper bus operation. An 
exanple of one of aany methods known to persons skilled in the 
art for using feedback techniques to control current is described 

;^ in Bans Schumacher r et al., "CMOS Subnanosecond True-ECL Output 
10 % Buffer,- J. Solid State Circuite, vol. 25 (1), pp. 150-154 (Feb. 

1^2 1990) e Controlling this current Isiproves performance and reduces 

{i2 power ditfiipatlon. This output driver %Aich can be operated at 

rsc- 

500 KHt, can In turn be controlled by a suitable aultiplexer vith 
H two or sore (preferably four) inputs connected to other internal 
15 0 chip circuitry, all of which can be designed according to %rall 
% known prior art. 

The input receivers of every slave must be able to 
operate during every cycle to determine i^ether the signal on the 
bus is a valid request packet. This requirement leads to a 
20 number of constraints on the input circuitry* Zn addition to 

requiring small acquisition and resolution delays, the circuits 
must take little or no DC power, little AC power and inject very 
little current back into the input or reference lines* The 
standard clocked DRAM sense amp shown in Figure 11 satisfies all 
25 these requirements except the need for low input currents. When 
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this senso aap gotts from ■•nse to mample, the capaeltanoe of the 
intwenal nodei 83 end 84 in^S^J^ Lm discharged l^ucoagh the 
seference line €8 and input €9, respectively, fiiie particular 
current is snail, but the sum of such currents from all the 
inputs into the reference lines summed over all devices can be 
reasonably large. 

The fact that the sign of the current depends upon on 
the previous received data makes matters %ror8e. One %ray to solve 
this problem is to divide the sample period into t%«o phases. 
During the first phase, the inputs are shorted to a buffered 
version of the reference level (which may have an offset). 
During the second phase, the inputs are connected to the true 
inputs. This scheme does not remove the input current 
completely, since the input must still charge nodes 83 and 84 
from the reference value to the current input value, but it does 
reduce the total charge required by about a factor of 10 
(requiring only a 0.25V change rather than a 2.5V change). 
Persons skilled in the art will recognize that many other methods 
can be used to provide a clocked amplifier that trill operate on 
very low input currents. 

One important part of the input/output circuitry 
generates an internal device clock based on early and late bus 
clodcs. Controlling clock skew (the difference in clock timing 
between devices) is important in a system riinning with 2 ns 
cycles, thus the internal device clock is generated so the input 
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sanplwr and tha ontpat driver pp«rat« as closa in tiaft *• 



.poBBibla to aldway between the tifo bus clocks. 



A block diagram of the internal device clock generating 



circuit is shown in Figure 12 and the corresponding tilling 
5 diagram in Figure 13. The basic idea behind this circuit is 
relatively single. A DC amplifier 102 is used to convert the 
small-swing bus clock into a full-swing CMOS signal. This signal 
is then fed into a variable delay line 103. The output of delay 
^ line 103 feeds three additional delay lines i 104 having a fixed 
10 delay; 105 having the same fixed delay plus a second variable 
fl delay; and 106 having the same fixed delay plus one half of the 
O second variable delay. The outputs 107, 108 of the delay lines 
r 104 and 105 drive clocked input receivers 101 and 111 connected 
U to early and late bus clock inputs 100 and 110, respectively. 
15 1^ These input receivers 101 and 111 have the same design as the 
;^ receivers described above and shown in Fig. 11. Variable delay 
lines 103 and 105 are adjusted via feedback lines 116, 115 so 
that input receivers 101 and 111 sample the bus clocks just as 
they transition. Delay lines 103 and 105 are adjusted so that 
20 the falling edge 120 of output 107 precedes the falling edge 121 
of the early bus clock, Clockl 53, by an amount of time 128 equal 
to the delay in input sampler 101. Delay line 108 is adjusted in 
the same tray so that falling edge 122 precedes the falling edge 
123 of late bus clock, Clock2 54, by the delay 128 in Input 
25 sampler 111 . 
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Since th« outputs 107 and 108 are aynchronisad with tha 
: two ham clockB and the output 73 of the last delay line 106 it 
midway between outputs 107 and 108, that Is, output 73 follows 
output 107 by the same amount of time 129 that output 73 precedes 
5 output 108, output 73 provides an internal device clock midway 
between the bus clocks* The falling edge 124 of internal device 
clock 73 precedes the time of actual input sampling 125 by one 
sampler delay. Hote that this circuit organisation automatically 
fO balances the delay in substantially all device input receivers 71 
10 In and 72 (Fig. 10), since outputs 107 and 108 are adjusted so the 
m bus clocks are san^led by input receivers 101 and 111 just as the 
n bus clocks transition. 

1^ In the preferred embodiment, two sets of these delay 

■T: lines are used, one to generate the true value of the internal 
15 9 device clock 73, and the other to generate the complement 74 
M without adding any inverter delay. The dual circuit allows 
generation of truly complementary clocks, with extremely small 
skew. The complement internal device clock is used to clock the 
'even' input receivers to sample at time 127, while the true 
20 internal device clock is used to clock the 'odd' input receivers 
to sanple at time 125. The true and complement internal device 
clocks are also used to select which data is driven to the output 
drivers. The gate delay between the internal device clock and 
output circuits driving the bus is slightly greater than the 
25 corresponding delay for the input circuits, which means that the 
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a«w data always id.ll ba drivan on tha bus alightly mttmx tha old 
data has baan aaaplad. 



HUM Oolnmn Access Modification 
5 A block diagram of a convantional 4 MBit DRAM 130 is 

shown in Pigura 15. The DRAM nenory array is divided into a 
number of subarrays 150-157, for example, 8. Each subarray is " 
divided into arrays 148, 149 of memory cells. Row address 
:| selection is performed by decoders 146. A column decoder 147A, 
10 147B, including column sense amps on either aide of the decoder, 
runs through the core of each subarray. These column sense amps^ 
can be set to precharge or latch the most-recently stored value, 
^ as described in detail above. Internal I/O lines connect .each 
set of sense-amps, as gated by corresponding column decoders, to 
15 p input and output circuitry connected ultimately to the device 
1^ pins. These internal I/O lines are used to drive the data from 
the selected bit lines to the data pins (some of pins 131-145), 
or to take the data from the pins and write the selected bit 
lines. Such a column access path organised by prior art 
20 constraints does not have sufficient bandwidth to interface with 
a high speed bus. The method of this Invention does not require 
changing the overall method used for column access, but does 
change Implementation details. Many of these details have been 
implemented selectively in certain fast memory devices, but never 
25 in conjunction with rhe bus architecture of this invention. 
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Running the Internal I/O lines in the oonrantional way 
^^^^^•t high bos cycle rates is not possible. Zn the pr«£«xred 

Mthod, several (preferably 4) bytes are read or written during 
each cycle and the column access path is modified to run at a 
5 lo%rer rate (the inverse of the number of bytes accessed per 

cycle, preferably 1/4 of the bus cycle rate). Three different 
techniques are used to provide the additional internal I/O lines 
required and to supply data to memory cells at this rate. Pirst, 
i the number of I/O bit lines in each subarray running through the 
^10 lo column decoder l^^'is increased, for example, to 16, eight for 
m each of the two columns of column sense amps and the column 
S decoder selects one set of columns from the 'top' half 148 of 
u subarray 150 and one set of columns from the "bottom" half 149 
;m during each cycle, where the column decoder selects one column 
15 5 sense amp per I/O bit line. Second, each column I/O line is 
divided into two halves, carrying data independently over 
separate internal I/O lines from the left half 147A and right 
half 147B of each subarray (dividing each subarray into 
quadrants) and the column decoder selects sense amps from each 
20 right and left half of the subarray, doubling the number of bits 
available at each cycle. Thus each column decode selection turns 
on a column sense amps, where n equals four (top laft and right, 
bottom left and right quadrants) times the number o£ I/O lines in 
the bus to each subarray quadrant (8 lines each x 4-32 lines in 
25 the preferred implementation). Finally, during aach HAS cycle. 
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tMO diffsrant subarrays, •.g. 157 and 153, ara accassad. fixia 
.^dooblaa again tha avallabla ntmbar of I/O linas containing data. 
'^Takan togathar, thasa changas Incraaaa tha Intarnal 1/0 bandwidth 
by at laast a factor of 6. Four Internal buses are used to route 
5 these Internal Z/0 lines. Increasing the number of I/O lines and 
then splitting them In the middle greatly reduces the capacitance 
of each internal I/O line %fhlch in turn reduces the column access 
time, Increasing the column access bandwidth even further. 
% The multiple, gated Input receivers described above 

10 :^ allow high speed Input from the device pins onto the internal I/O 
if; lines and ultimately into memory. The multiplexed output driver 
5 described above is used to keep up with the data flow available 
: using these techniques. Control means are provided to select 
whether information at the device pins should be treated as an 
15 D address, and therefore to be decoded, or input or output data to 
m be driven onto or read from the internal I/O lines. 

Bach subarray can access 32 bits per cycle, 16 bits 
from the left subarray and 16 from the right aubarray. With 8 
I/O lines per sense-amplifier column and accessing two subarrays 
20 at a time, the DRAM can provide 64 bits per cycle. This extra 

I/O bandwidth is not needed for reads (and is probably not used), 
but may be needed for wites . Availability of wite bandwidth is 
a more difficult problem than read bandwidth because over-writing 
a value in a sense-amplifier may be a slow operation, depending 
25 on how the sense amplifier is connected to the bit line. The 
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«xtrft mmt of internal Z/o llnat provides some bandwidth aargin 
for write operationa. 
W ■} Persona skilled In the art will recognise that aany 

variations of the teachings of this Invention can be practiced 
5 that atlll fall within the claims of this invention which follow. 
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